# Low-latency processing
Erax WoW Turbo V1.1
MIT
A Whisper Large-v3 Turbo speech recognition model optimized for Vietnamese, supporting multiple languages with ultra-fast response and high accuracy
Speech Recognition
Transformers Other

E
erax-ai
666
11
Erax WoW Turbo V1.0
MIT
A Whisper Large-v3 Turbo speech recognition model optimized for Vietnamese, supporting real-time transcription in multiple languages
Speech Recognition
Transformers Other

E
erax-ai
655
49
VITA 1.5
VITA-1.5 is a multimodal interaction model designed to achieve GPT-4o level real-time vision and voice interaction capabilities.
V
VITA-MLLM
345
40
Speaker Diarization V1
MIT
This is a speaker segmentation model based on powerset multi-class cross-entropy loss, capable of processing 10-second mono audio and outputting speaker segmentation results.
Speaker Analysis
S
objects76
13
0
Chester Bennington RVC 1000 Epochs
This is a model based on RVC (Real-time Voice Conversion) technology, specifically designed to convert input speech into Chester Bennington's vocal style.
Speech Synthesis
Transformers

C
sail-rvc
2,869
2
Wsj0 2mix Skim Small Causal
This is a speech enhancement model trained based on the ESPnet framework, specifically designed for speech separation tasks in the wsj0_2mix dataset.
Audio Enhancement English
W
lichenda
26
1
Ai Light Dance Stepmania Ft Wav2vec2 Large Xlsr 53 V5
Apache-2.0
Automatic speech recognition model based on wav2vec2-large-xlsr-53, fine-tuned on the GARY109/AI_LIGHT_DANCE dataset
Speech Recognition
Transformers

A
gary109
160
0
Featured Recommended AI Models